fix(ported_static): Approach-1 + stale-skip cleanup for Amsterdam OoG-by-design tests#2843
fix(ported_static): Approach-1 + stale-skip cleanup for Amsterdam OoG-by-design tests#2843leolara wants to merge 6 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## devnets/bal/7 #2843 +/- ##
================================================
Coverage ? 87.35%
================================================
Files ? 586
Lines ? 35957
Branches ? 3382
================================================
Hits ? 31410
Misses ? 3926
Partials ? 621
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
OoG-by-design ported_static tests are calibrated to land just below a cost boundary; on Amsterdam each fresh SSTORE-set and CREATE spills its state-gas portion back into regular gas when the per-tx reservoir is empty. Tests that expect "N SSTOREs and M CREATEs complete before OoG" need their gas budget lifted by N * sstore_state_gas + M * create_state_gas to land at the same intermediate state. `Fork.oog_budget_lift(sstores_before_oog=N, creates_before_oog=M)` composes the two existing state-gas helpers and returns 0 on pre-EIP-8037 forks (where both helpers are 0), so callers can apply it unconditionally without a fork guard. Unit-tested on Cancun (zero) and Amsterdam (cumulative spill).
…*/stRevert*/stWallet* tests Eight tests share the OoG-by-design shape `tx_gas = [oog_path, success_path]` where the success_path budget is tuned to barely complete a single CREATE, a CREATE plus a few SSTOREs, or a deploy chain. On Amsterdam EIP-8037 splits NEW_ACCOUNT, fresh SSTORE-set, and code-deposit cost into a regular portion plus a state-gas portion; with an empty reservoir, the state-gas spills back into regular gas and breaks the success_path budget. Apply `Fork.oog_budget_lift` with the right (creates, sstores, deploy_code_size) counts to lift the budget on Amsterdam only. Pre- EIP-8037 forks return 0 from the helper, so the original budget is preserved. Files (skip-list entries cleared): - stCreate2/test_create2_oo_gafter_init_code.py (-g1) - stCreate2/test_create2_oo_gafter_init_code_returndata2.py (-g1) - stCreateTest/test_create_oo_gafter_init_code.py (-g1) - stCreateTest/test_create_oo_gafter_init_code_returndata2.py (-g1) - stRevertTest/test_revert_sub_call_storage_oog.py (-g1-v0) - stRevertTest/test_revert_sub_call_storage_oog2.py (-g1-v0) - stWalletTest/test_wallet_construction_oog.py (-g1) - stWalletTest/test_multi_owned_construction_not_enough_gas_partial.py (-g1) Removes 8 entries from amsterdam_skip_list.txt.
…eTest 16-pair family
22 stSStoreTest files (`sstore_0to*`, `sstore_xto*` excluding `gas`,
`gas_left`, `change_from_external_call_in_init_code`) had 3 skip
entries each (`d{0,1,2}-g1`). Re-running these on Amsterdam with the
skip list disabled shows all 60 fixture variants per file already
pass without any code changes.
The post-state expectation at `g=1` is `contract_2: storage={1: 0}`
(only slot 1, asserted to be zero). On Cancun, ~14 of the 16
SSTORE pairs complete before OoG; on Amsterdam EIP-8037 the state-
gas spill cuts that to ~5 pairs. In both cases slot 1 ends at 0 and
the final `SSTORE(1, 1)` does not run, so the assertion holds on
both forks unchanged.
These were defensive skip entries from an earlier snapshot. Remove
all 66 entries; no test-file changes needed.
a623fa1 to
1ad2430
Compare
| Bytes(""), | ||
| ] | ||
| tx_gas = [54000, 55000] | ||
| tx_gas = [54000, 55000 + fork.oog_budget_lift(creates_before_oog=1)] |
There was a problem hiding this comment.
Should we apply the lift to both tx_gas values? Claude tells me that: Without the lift, tx_gas[0] on Amsterdam OoGs at CREATE2 dispatch before init code runs — the assertions still pass, but the test no longer exercises the scenario it claims to.
There was a problem hiding this comment.
I think my last commit fixes this? Now I see that the first transaction was ooG in the wrong place
| # stripping the fixture-format suffix in conftest.py). | ||
| # | ||
| # Total entries: 554 | ||
| # Total entries: 480 |
…ter_init_code tests Address kclowes's review on ethereum#2843: with only tx_gas[1] lifted, g=0 OoG'd at CREATE/CREATE2 dispatch on Amsterdam (NEW_ACCOUNT state-gas spill) before init code ever ran — the assertion still held (`NONEXISTENT` either way) but the failure mode shifted from "OoG after init code" (the test's named scenario) to "dispatch-time OoG". A clean closed form using `fork.oog_budget_lift(creates_before_oog=1)` (183600) overshoots and pushes g=0 past the deploy threshold. The Cancun 1000-gas gap between g=0 and g=1 collapses on Amsterdam: once dispatch is cleared, the 5-byte init code is cheap enough to always complete. Empirical binary-search on both files puts the safe range at (166499, 167000); 166_750 sits in the middle, keeping g=0 OoG'ing at dispatch and g=1 just clearing the deploy threshold. The two `_returndata2` variants are left unchanged — g=0's post-state happens to land identically on both forks at the existing budget, and adding any lift breaks them.
the directory In this case the content of this file is: # OoG-by-design ported_static tests — fix approaches for Amsterdam
After PR #2839 there are still **119 OoG-by-design entries** in
`tests/ported_static/amsterdam_skip_list.txt`. These are tests whose
budgets are tuned to land *just below* an EIP-8037-affected cost
boundary (SSTORE-set, CALL+value to empty account, MEMORY expansion,
nested CALL forwarding), so a blind gas bump invalidates their
assertion and a plain skip loses the coverage.
This note enumerates tractable approaches, ordered by risk and
expected entries cleared.
## Approach 1 — extend `tx_gas` with an Amsterdam threshold (lowest risk)
Tests of shape
```python
tx_gas = [happy_path, just_below_threshold]where the threshold is "intrinsic + 1 spillable op" can be made tx_gas = [800000, 80000]
if fork.is_eip_enabled(8037):
# Amsterdam: OoG threshold lifts by the SSTORE-set state-gas
tx_gas = [800000, 80000 + fork.sstore_state_gas()]Works when the OoG depends on one spillable operation. Pick-up: Approach 2 — derive the bump from the expected post-stateFor tests that complete N SSTOREs and then OoG (post-state shows def oog_budget(base: int, *, sstores_before_oog: int, fork: Fork) -> int:
"""Adjust an OoG-tuned gas budget for EIP-8037 state-gas spill."""
return base + sstores_before_oog * fork.sstore_state_gas()Per-test usage stays a single line. Pick-up: ~40–50 entries — Approach 3 — fork-conditional
|
|
@spencer-tb I think the lint problem I am getting is from the base branch, on which branch should I rebase this? |
|
| Metric | Count |
|---|---|
| Skip-list entries (nodeid substrings) | 480 |
| Skipped fixture variants on Amsterdam fill | 1,197 |
| Passing fixture variants on Amsterdam | 17,733 |
Amsterdam pass rate on tests/ported_static/ |
93.7% |
Cumulative reduction across all my PRs
| Stage | Skip-list entries | Fixture variants |
|---|---|---|
| Before PR #2790 (start of work) | ~897 | ~2,691 |
| After PR #2796 + #2790 | ~732 | ~2,196 |
| After PR #2839 | 554 | ~1,662 |
| After PR #2843 (this PR) | 480 | 1,197 |
| Net cleared by my work | −417 | ~−1,494 |
Notes
- Skip-list entries are nodeid substrings; each typically expands to
~2–3 fixture variants (state_test,
blockchain_test_from_state_test,
blockchain_test_engine_from_state_test). - The 1,197 figure is the actual count
uv run fill --fork Amsterdam
reports asskipped. - For comparison: issue Static Test Fail Tracker for EIP-8037 #2601 quoted 3,423 failures on the
pre-recalibrationeips/amsterdam/eip-8037branch. The 1,197 number
onbal/7is ~65% lower despite the CPSB-1530 recalibration adding
new failure modes that didn't exist at the time Static Test Fail Tracker for EIP-8037 #2601 was filed. - Classification of the remaining 480 entries (OoG-by-design,
gas-measurement, bytecode-baked, multi-param-per-d, balance-refund,
size-limit, other) is in
bal-7-skipped-tests-classification.md.
…terdam-oog-by-design
… → devnets/bal/7 merge
The May-18 merge `dffc4cfea` ("Merge remote-tracking branch
'upstream/forks/amsterdam' into devnets/bal/7") had two conflict
resolutions that left `bal/7` in a state where `just static` fails:
1. `src/ethereum/forks/amsterdam/blocks.py` ended up with the EIP-7843
`slot_number: U64` field declared twice (lines 260 and 268).
`mypy` rejects it with `[no-redef]`; `ethereum-spec-lint` crashes
with `ValueError: duplicate path Header.slot_number`. Remove the
second copy.
2. `BuiltBlock.derive_engine_payload_modifier` was dropped from
`packages/testing/src/execution_testing/specs/blockchain.py` while
its 5 call sites in `specs/tests/test_types.py` were kept. `mypy`
reports 5 `attr-defined` errors. Restore the staticmethod (and the
`FixtureExecutionPayloadModifier` import it needs) from
`forks/amsterdam`.
The instance-level wiring on `forks/amsterdam` (`BuiltBlock.rlp_modifier`
field, constructor pass-through, and `get_fixture_engine_new_payload`
call) is **not** restored — neither the linter nor the test file
references it, and `bal/7`'s current `get_fixture_engine_new_payload`
already runs without it.
This unblocks CI on every open PR against `devnets/bal/7` (including
this one).
|
@spencer-tb This commit here: 615745c is fixing something unrelated to this PR, but an error in bal/7 branch, that is causing the automatic tests to don't pass |
|
@kclowes could you please check again? |
🗒️ Description
Follow-up to #2839 — applies the Approach-1 / Approach-2 strategies
described in
.mb/oog-by-design-amsterdam-approaches.mdto theOoG-by-design subset of Amsterdam-skipped ported_static tests.
Clears 74 skip entries (
tests/ported_static/amsterdam_skip_list.txt897 → 823).
Commits
feat(forks): add Fork.oog_budget_lift helper— new classmethodon
BaseForkthat composessstore_state_gas,create_state_gas,and
code_deposit_state_gasinto a single budget lift for testscalibrated to OoG just below an EIP-8037-affected cost boundary.
Returns 0 on pre-EIP-8037 forks, so callers apply it
unconditionally without a fork guard.
fix(ported_static): Approach-1 fork-conditional OoG lift—8 single-spillable-op tests where
tx_gas[1]is tuned to barelycomplete a CREATE / SSTORE / CREATE+SSTOREs on Cancun but OoGs on
Amsterdam due to state-gas spill. Lift the budget with
Fork.oog_budget_lift(...)using the right(creates, sstores, deploy_code_size) counts:
chore(ported_static): remove stale Amsterdam skip entries for stSStoreTest 16-pair family—22 stSStoreTest files (`sstore_0to*`, `sstore_xto*` excluding
`gas`, `gas_left`, `change_from_external_call_in_init_code`) had
3 skip entries each. Re-running these on Amsterdam with the skip
list disabled shows all 60 fixture variants per file already
pass without any code changes — the post-state assertion
(`contract_2: storage={1: 0}`) holds whether OoG fires at pair 14
(Cancun) or pair 5 (Amsterdam after state-gas spill). Remove all
66 stale entries, no test-file changes.
chore: ruff format oog_budget_lift unit test assertions—cosmetic format pass on the helper's unit test.
Approach 4 (1024-depth CALL family) — deferred
The 3 `call1024_oog` / `callcode1024_oog` files have a per-frame
state-gas interaction that the simple `oog_budget_lift` model
underestimates (depth dropped from 134→44 frames with a 3×SSTORE
lift, vs expected near-zero drop). They need per-frame analysis
beyond the helper's scope. Leaving them skipped; follow-up PR.
Verification
🔗 Related Issues or PRs
Follows #2839 on the same fork. Independent — can merge in either
order.
✅ Checklist
case).
edited tests; `@ported_from` preserved from upstream.